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Abstract. Traditionally stationarity refers to shift invariance of the 
distribution of a stochastic process. In this paper, we rediscover sta¬ 
tionarity as a path property instead of a distributional property. More 
precisely, we characterize a set of paths denoted as A, which corresponds 
to the notion of stationarity. On one hand, the set A is shown to be 
large enough, so that for any stationary process, almost all of its paths 
are in A. On the other hand, we prove that any path in A will behave in 
the optimal way under any stationarity test satisfying some mild condi¬ 
tions. The results provide a unified framework to understand and assess 
the existing time series tests for stationarity, and can potentially lead 
to new families of stationarity tests. 


1. Motivation 


Stationarity plays an important role in time series analysis. Many statis¬ 
tical properties of a time series rely on the assumption that the time series 
is above all stationary. The tests for stationarity, therefore, become crucial 
and should be applied as a preliminary step in many analysis. In the time 
series literature, various tests have been proposed. Many existing tests to 
discriminate between stationarity and nonstationarity rely on the concept 
of a unit root, such a s the Dickey-Fuller type tests proposed for instance 
by Dickey and Fuller ( 1979 1 and the KPSS ty pe tes ts proposed for instance 


, T 

by IKwiatkowski. Phillips. Schmidt and Shin (119921 ) respectively. The first 


type of tests has unit root as the null hypothesis, while the second type of 
tests has stationarity as the null hypotesis. However, this unit-root concept 
is specifically defined for linear autoregressive models with finite-variance 
disturbances. As a result, many of the existing tests based on the unit root 
concept is not always suitable for examining generic stationarity or stability 
property of time-series processes. 
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There exist a few tests based on ideas more directly related to station- 

(20o 3) for instance proposed a 


arity. In the time domain, Xiao and Lima 


test which works against the alternatives with time-varying second moments. 
Further tests have also been developed in the frequency domain, using spec¬ 
tral decompos ition and wavelets. To name a few, w e cite t he p io neeri ng work 


by Pri estl e y and Subba Rao ( 19691 ). followed by von Sachs and Neumann 


lifiiij) and IXason 1 feoi.ll l. Their approach can be regarded as a mixture of 
the analysis in the frequency domain and in the time domain, in the sense 
that they ch eck the constancy of the r esult of the spectral decomposition 


across time. Dwivedi and Subba Rao (2011) constructed a test purely in 


the frequency domain by considering the correlation of the discrete Fourier 
transform at the canonical frequencies. 

In principle, all of the tests that we cited so far are tests for second-order 
stationarity, also known as “weak” stationarity. However, the tests in the 
time domain can be modified to test for strict stationarity by inc orpo rating 


i nformation from different levels. This th read of works includes iKapetanios 


J?- 

I). 


( 2007 1. Busetti and Harvev ( 2010h . and Lima and Neri ( 2013 1. Other tests 
for strict stationarity r ely on more spec ific assumptions such as Markov pro 
erty. See, for example. IDomowitz and El-Gamal 1 (12001) and Kanava ( 201 ll 
It should also be pointed out that researchers do not always draw a clear 
distinction between tests that are designed to test for strict stationarity and 
tests that are designed to test for second-order stationarity, due to the logical, 
technical and historical links between these two concepts. 

Stationarity tests for time series are unique relative to their counterparts 
for stochastic processes in general, where a number of independent or corre¬ 
lated paths are often available. For time series, typically only one path (or 
realization) is available, and all of the conclusions about the time series must 
be drawn based on the information extracted from this single path. Thus, 
in some sense, stationarity tests for time series transform stationarity very 
naturally from a distributional property to a path property, with each par¬ 
ticular stationary test dividing the path space into a “stationary/acceptance 
region” and a “non-stationary/rejection region”. 

A careful reader would point out that the above argument is not sufficient 
to transform stationarity into a path property, since the same reasoning 
works for all of the properties for which time series tests exist. However, 
there is a fundamental difference between path properties and distributional 
properties in terms of the results produced by the time series tests. For a path 
property, such as monotonicity, exceedance to a threshold, etc., assuming 
that we have a large enough data set, all of the “reasonable” tests should 
give similar results to a fixed path, since there is a definite answer to the 
question as whether the given path possesses this property. In contrast, 
different tests normally give different results if the property of interest in 
terms of a distributional, such as Gaussian or Markov, property. In this case, 
the answer will depend on the test used, or more precisely, the mechanism 
upon which the tests are constructed. 
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Logically, a stationarity test for time series should capture some “essen¬ 
tial” properties possessed by “typical” (e.g., almost all) paths of stationary 
processes, and it should be used to verify whether the given path has this 
property. Equivalently, the test can also be used to verify the existence of 
some traits which should be essentially absent in a stationary process, and 
utilize this result as a basis to reject the null hypothesis of stationarity. Fol¬ 
lowing the reasoning in the last paragraph, the critical question is, what 
properties are deemed to be “essential” in distinguishing between stationar¬ 
ity and non-stationarity, and whether we will obtain the same result for a 
given path when different properties are used for evaluation? 

In principle, any property which is satisfied by all of the stationary pro¬ 
cesses with a higher probability than the non-stationary processes, or the 
opposite case, should work. There are so many of them, so that it seems to 
be hopeless to come up with a clear idea about how such a property should 
look like. On the other hand, interestingly, it seems that we have a relatively 
clear notion about which paths are “stationary”, or more precisely, which are 
not. Let us consider the following examples: 

Let X = {X n } neNo be a time series over an infinite time horizon, where 
No stands for the set of all non-negative integers. Let H be the path space 
M n ° equipped with the cylindrical <7-field. 

Example 1.1. //x = {x n } ne N 0 is strictly increasing, then the corresponding 
time series should not be stationary, since P(X is strictly incresing) = 0 for 
any stationary time series X. 

Example 1.2. If there exists k such that x *. > sup iG pj 0 Xi, then the time 
series should not be stationary. Intuitively, with probability 1, a stationary 
time series does not have a peak which is never attainable again. 

Given the above examples, it might be tempting to argue that since each 
path is special in a certain sense, it will be rejected for stationarity by some 
tests. In other words, the abundance of the criteria which can be used for 
stationarity will result in an empty intersection for their acceptance regions in 
the path space. If this is the case, then stationarity should not be considered 
as a path property, because it means that the result of a stationarity test 
for a given path solely depends on the properties upon which the test is 
constructed. This, however, turns out not to be the case. In fact, there exist 
paths which should not be excluded from stationarity in any case, as shown 
by the following examples. 

Example 1.3. Let x = (c, c,...) be a sequence of constant c £ R. Then one 
should not conclude that x is not stationary. Actually, if a stationarity test 
rejects such a path, then for this constant stationary process, its type I error 
will be identically equal to 1. 

Example 1.4. Let x = (xo,.Ti,...), where x n = sin(n0 + G No- This 

is a wave with period 2ir/8 and phase ipo, observed at integer times. Notice 
that if we make ipo to be random and uniformly distributed on [0, 27t), then x 
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becomes a stationary process. Therefore if we consider that all of the phases 
are equal in determining whether the path x is stationary, which seems an 
irrefutable argument, then such x should not be rejected for stationarity when 
tested. This example extends to all of the periodic functions observed at 
integers. 


The examples above show how a strong, intuitive distinction between sta¬ 
tionary and non-stationary paths exists in our mind, which enables us to tell 
the non-stationary paths from the stationary ones even before we venture 
into finding an appropriate set of criteria to discriminate them. Thus such 
an intuitive distinction should be built upon some principles more funda¬ 
mentally than the numerous specific path properties such as monotonicity, 
the number of peaks, etc. 

The goal of our paper is to flash out these principles, and to show that 
they actually form the basis for most existing stationarity tests. In partic¬ 
ular, there are three conditions underlying any stationarity test. Roughly 
speaking, the first condition requires that for any event of a certain type, 
if it happens once, it must happen infinitely many times along the path, 
with a positive limiting frequency; the second condition is a mild condition 
which prevents any non-negligible part of the path from escaping to infinity; 
and the third condition is more of a technical nature, and is related to the 
ergodicity of the path. 

The three conditions mentioned above identify a set of paths, denoted as 
set A. We show that this is exactly the set of all of the paths which should 
be classified as “stationary”. We firstly prove that the set A is large enough, 
such that it contains almost all of the paths of any stationary process; then 
we show that the set A is also small enough, such that it only includes 
those paths which yield the best possible results under any given stationarity 
test. Thus, this justifies the idea that the notion of stationarity can be 
transformed profitably into a path property, and that the path space can 
be divided into an “essentially stationary” part and its complement. These 
results also show how the three proposed conditions can usefully serve as a 
basis for our intuition about the distinction between stationarity and non- 
stationarity, and provide a unified framework to understand and assess the 
existing stationarity tests. 

The rest of the paper is organized as follows. In Section 2 we introduce the 
basic set-up and construct the set A of all the “stationary” paths. Section 3 
shows that the set A is large enough to contain almost all of the paths for any 
stationary process. A practical criterion to check one of the conditions that 
defines A is also established. Finally, in Section 4 we prove that A is also 
small enough, so that any path in A will be statistically indistinguishable 
with a typical path of certain stationary process, in the sense that it will 
behave optimally under any stationarity test satisfying some mild conditions. 
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2. Basic Set-up 


Let x = {x n } ng N 0 be a numerical sequence in R. For k G N, define 
I = Jo X ... X Ik-i £ where X is the collection of open intervals on the 
real line. Define a set Sf = S'J(x) of non-negative integers by 

'S'fe(x) := {n > 0 : x n € I 0 ,..., x n+k -i € 4_i}. 

Denote by IV* = {IV*(n)} ne N the counting function of S\. That is, 

N k {n) = I'S'J fl [0, n — 1]|, 


where | • | for a set gives the number of elements in a set. We say that 
Property E holds for x, with parameters k and I, if the corresponding Nl 

satisfies that either Njt( oo) = 0, or lim n _ > , 00 —^— > 0. 

Define the density of a set S C No as lin^^oo 1 11 }f the li m it exists. 

Then Property E says that 5* either is empty or has a positive density. 

Let Aq be the set of all the numerical sequences such that Property E 
holds for all k G N and I e X k . 

We further add a tightness condition, called Property T: 


lim lim 

A'—»oo n—xx> n 


1 n 

-Ewi 


Xi 


i= 1 


lim lim 

K—>oo n—voo 


N- 


(~K,K) 


(n) 


n 


= 1. 


Intuitively, Property T prevents the “main part” of the sequence from es¬ 
caping to infinity. We call A\ a subset of Aq consisting of all of the sequences 
in Aq which satisfies Property T. 

Denote by , n £ N the marginal empirical measures of a sequence x € 
A \, determined by 


Fn(I) = 


N(( 


n 


I el. 


n 


The fact that x e Aq implies that liiri^^^ F r \ (I) always exists, Property 
T then guarantees that the sequence of measures {-EnjneN is tight, and 
hence lim r) _ s . 00 F^ (I) generates a probability measure. More generally, for 
any l€N, the k dimensional empirical measure is defined by 



?M, I 6 2*. 


It is easy to see that Property T also assures the tightness of any finite- 
dimensional empirical measures, and thus lim^^oo F^{X) generates a proba¬ 
bility measure on M fc . 

Together, the family of limiting probability measures {lim n _ J . 0O F^}ke N 
satisfies the consistency condition, and thus by Kolmogorov’s existence the¬ 
orem, there exists a stationary process Y = {Y n } nG pj 0 , such that any finite 
dimensional distribution of Y: 


= lim Ft 


FY 0 ,...,Yk-i 
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The process Y = Y x is unique in distribution since all of its finite dimen¬ 
sional distributions are completely determined by the empirical measures of 
the sequence x. We call Y x the stationary process induced by the numerical 
sequence x G di. 

Define set 

A := {x £ A\ : Y x is ergodic }. 

Also, notice that to make Y x well-defined, we only need a weaker version 
of Property E, where linin^oo exists for any k £ N and I £ T k , but 

1V*(oo) > 0 does not necessarily imply linin^oo > 0. 

3. Coverage by A of Paths from Stationary Processes 

The following theorem shows that the set A is large enough, so that every 
stationary time series puts mass 1 on A. 

Theorem 3.1. Let X = {X n } n= be a stationary time series. Then 
P(X € A) = 1. 

Proof. Firstly, by ergodic decomposition, it suffices to prove the result for the 
case where X is ergodic. Moreover, for ergodic process X, once we prove that 
P(X £ Aq) = 1, it follows immediately that P(X £ A) = 1 as well, since 
Property T and the ergodicity of the path are guaranteed by the pointwise 
ergodic theorem. Thus it suffices to prove that P(X £ Aq) = 1. 

The fact that Property E holds for any fixed k and any single I almost 
surely is a trivial consequence of the pointwise ergodic theorem. As a result, 
Property E also holds for any countable set of (. k , I) almost surely. In the 
rest of the proof, for ease of notation, we will focus on the case where k = 1, 
and prove that Property E holds for all I £ I almost surely. The cases for 
k > 1 follow in a similar way. 

Let F\ be the marginal distribution of Xj~ for any k = 0,1,.... Denote by 
D\ the set of atoms of Pi: 

D\ = {a £ M : Fi({a}) > 0}, 

and D = D\ UQU {—oo, oo}, then both D\ and D are at most countable 
sets. Hence the set 

Aq := {x £ M n : Property E holds for k = 1 and any I = (a, b), a, b £ D} 

satisfies P(X £ A 2 ) = 1. Thus from now on we can assume that the paths 
are in Aq. 

For any open interval (a, 6 ), there exists an increasing sequence of open 
intervals {(a.*, 6 *)}i=i, 2 ,..., such that ai,bi £ D for i = 1,2,..., and (a, 6 ) = 
Ui(a,:,6,;) = liirij_ > . 00 (aj, b t ). Let the corresponding sets be S and Si, and 
the corresponding counting functions be N(n) and N l (n). By construction, 
S = lirn^oo Si, and N(n ) = lirn^oo TVj(n) for n £ N. Suppose N( 00 ) > 0 
but liirif^oo —— = 0 for some path in Aq, then for i large enough, we also 
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have iVj(oo) > 0 and = 0 , which contradicts the construction 

of A' 2 . Therefore the only possibility that a path x is in A 2 \ A is that the 
corresponding ratio -—1 does not admit a limit as n —> 00 . 

By the pointwise ergodic theorem, for any fixed open interval I, we have 


E n—l 

i =0 L {xi&I} 


n 


E(1 {Xo el}) = P{X 0 G I) 


almost surely. Thus if we define the set 
spn-l ^ 

B : = {x : ^=0 ^ P(X 0 € I) for all I = (a, b), a, b G D}, 

then P(B) = P(A 2 C\B) = 1. As a result, we can almost surely assume that 

xG^nn. 

Suppose that for such an x and for an open interval I = (a,b),a,b € R, 
the corresponding ratio —— does not admit a limit as n -> 00 . Without 
loss of generality, assume that a € D and b ^ D. The case where a ^ D, 
b G D and a ^ D, b ^ D are similar. The non-existence of the limit implies 
that 

r N ( n ) _lv ■ t iV ( n ) A 

u := hmsup- 7 = lnnmt- =: a. 

n-> 00 n n^oo n 

By definition, for any b' G D fl (a, 6 ), 

Ml—1 

a) 


li„, E "-° < Jim inf 1M - ,/. 

n—>oo Tl n—too TL 


On the other hand, for b" G D 0 ( 6 , 00 ), 

1 

( 2 ) 


r Er=o ifxiGCa.b"]} ^ r N (n) 

hm - 1 ---— > hmsup-= u. 

n->oo n t-too n 


The limit above exists because 

n— 1 ri—1 n—1 

l{zi6(a,b"]} = Y! l{a-fG(a,oo)} ~ l{xie(ft",oo)}- 
2—0 2=0 2=0 

Subtracting dH) from ([2]), we have 

Ei=0 l{a;i £[&',&"]} , „ 

lim - 11 -— > u — a > 0 

n —>00 u 

for any 6 , ,h ,/ G I? and b' < b < b”. Recall that since we work with A 2 fl B, 
this also implies that 

P(X 0 G [&', 6"]) > u - d. 

Because D is dense in R, we can take b' f b and b" 4- b, leading to the result 

P(Xq = b) > u — d > 0. 

However, since b (j D, b is not an atom of F\. Thus P(X 0 = b) = 0, which 
is a contradiction. Hence the assumption is almost surely false and the limit 
exists with probability 1 . □ 
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In reality, checking the ergodicity of Y x for a given x by definition firstly 
requires us to fully recover the distribution of Y x from x, then determine 
whether the process Y x is ergodic given its distribution. Unfortunately none 
of these two steps is practical. However for a given sequence x, we can derive 
an equivalent characterization of the ergodicity, which is directly built upon 
the behavior of the sequence rather than the property of the measure it 
induces. 

Definition 3.2. An asymptotically proportional contraction of the in¬ 
dex set No is a subset G of No consisting of disjoint intervals Gi of consecutive 
integers: 

G = U ZiGi, 

satisfying 

(1) Gi,i £ N are increasingly ordered. That is, min{n : n £ Gj+i} > 
max{n : n £ Gi}, i £ N; 

(2) |—>• oo as i —>• oo, where \ ■ | is the number of elements (integers) 
in a set; 

(3) ^ 0 ’ n ~^ nG l —» c > 0 as n —> oo. 

Definition 3.3. An asymptotically proportional contraction of a nu¬ 
merical sequence x = {x n } n= o,i,... is a subsequence {x ni } ni gG °f {x n }neN 0 > 
where G is an asymptotically proportional contraction of the index set No- 

Intuitively, an asymptotically proportional contraction of a numerical se¬ 
quence consists of pieces of the original sequence with length of the pieces 
going to infinity and the fraction of coverage converging to a fixed positive 
level. 


Theorem 3.4. Let x be a numerical sequence in A\. Then x £ A if and 
only if all of its asymptotically proportional contractions induce the same 
process as the original sequence. That is, for any asymptotically proportional 
contraction x', k £ N and I £ T k , 


lim 

n—>• oo 


A "M = Km N(") 


n 


n 


where N' is the counting function defined in the same way as previously but 
for the subsequence x'. 


To prove Theorem 13.41 let us firstly intro duce the following lemma. A 

However, the proof to 


similar result was presented in Furstenberg 


be presented below is much simpler, du e to t he difference in the framework 
used in this paper and that used in Furstenberg ( 196(li ). and the fact that 
we only need a one-directional result. 


Lemma 3.5. Let x be a path in A, therefore Y x be ergodic. Let k £ N, 
I = Iq x ... x Ik-i £ T. k and S'* = S*(x) be defined as previously. Then for 
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every e > 0, there is an N, such that the set 

n+N—l 


Rk,N < n € N 


v E RiO-rf 


N 


> e 


has a density smaller than e, where the constant pi = P(Vq c G Iq, 1 G 

h- 1 )- 

Proof. Notice that the existence of the density for the sets R\ n : 

A^m-0 U kN \ J 

hm - 1 - 

n —>oo n 

is guaranteed by Property E. Moreover, by the ergodicity of the path, the 
density of a set R\ n is exactly the probability of the corresponding event, 
namely, 

r 

lim - 


L k,N 


n—to o n 

N—l k -1 


=P 


=P 


1 

N 

1 

N 


EIlR««)-d 

i=0 j=0 
N-l 

E £A\} — Pk 

i =0 


> e 


> e 


where 8 is the shift operator, and A is a subset of the path space H, defined 


as 


A k = {x G H : Xi € Ii,i = 0,..., k — 1}. 

Assume that the result in Lemma 13.51 is not true. Then there is e > 
0, such that for any iV G N, either j n G N : I 51 {i) — p\ > ej 

or |n G N : Y17=n ~ 1 151 (*) ~ p\ < ~ e | has a density which is greater or 

equal to |. Without loss of generality, assume that the set 


n+N~ 1 


n G 


N: ^ E 1 Sl(i)-Pl>t 


has a density greater or equal to | for infinitely many IV £ N, denoted as 
{iVj}ieN- By ergodicity of the path x, this implies that 


/ 1 Nt-l 

P ( E 1 {^oY G aI } > Pk + e ] ^ 

for * G N. As a result, the event 

f 1 7V_1 

) E 1 {^'oYeAi} >Pk + e for infinitely many N 
{ 3 =0 
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has a probability greater or equal to This implies that 


1 v ^ 

lirn sup - 1 {():'! o-Yg_A\ } > Pk + e 

n—>oo n . 

1=1 


happens with a probability greater or equal to |. 
However, since Y is ergodic, 



almost surely, which is a contradiction. Therefore we conclude that the 
assumption is invalid and the result in Lemma [3.51 holds. □ 


Proof of Theorem \3.4\ Assume x£ A For k G N, I = Iq x ... x Ik-\ £ 
define 5'^(x) as previously. Let x 7 = {x ni } ni gG' be an asymptotically pro¬ 
portional contraction of x, where G = is the corresponding asymp¬ 

totically proportional contraction of No- To prove the “only if” direction, 
our goal is to prove that the set S'*(x 7 ) has the same density as S'i(x). Let 
c = lim^^oo ^ 0,n ~ 1 ^ nG l . By Lemma 13.51 for any e > 0, there exists N, such 
that the set 


II 

A 

HH-Sif 

n G N 0 : 

^ n+N—l 

N ~ Pk 

>■] 



j=n 



has a density smaller than e. Hence, the upper density of R * N in G, defined 
as 


lim sup 

n—>oo 


^ [O’ n 1] 01 G\ 
| [0, n — 1] fi G\ 


is smaller than |. Similar to R\. N , one can define 



/ 

rii £ G : 

1 i+N—l 

lsl(x')U) ~ Pk 

>■] 


< 

j=i 

) 


Since the operation of contraction will join different segments of the original 
path together, R'\ N and i?* n will not completely agree in G. However, 
since lim n _>. 00 \G n \ = oo, the two sets will have the same upper density in G. 
Therefore, the upper density of R'\ N is also smaller than A It is easy to see 
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that 


n— 1 


^ 

lilrl Slip — 1 sj (x') (i) 
n—>oo n Z. K 
2=0 

l^fe.Ar n [°> n -1] n G\ j 

- limsup -I7d- TTTTTt- 1 + 1 ■( Pk + e ) 

n—>oo | [0, n — 1J n G| 

<p\ + e(l + - 


Since e can be arbitrarily small, we must have 

n— 1 


lim Slip- 5^1. g J(x/)(*) <p\- 
n—xx> n —* K 

*=o 

Symmetrically, lim inf n ^oo ^ E"= o 1 isKxoW ^ pI- Thus 


lim — 

n —¥oo 77, 


n —1 

*=o 


Sj(x') 



which shows that S? (x') always has the same density, which is also the den¬ 
sity of S|,(x). 


Conversely, assume that xgii but x ^ A. Thus x induces a stationary 
process Y = Y x , but it is not ergodic. Therefore there exists p £ (0,1) and 
stationary processes Z and W with distinct distributions, such that Fy = 
pFz + {l— p)F\y. In particular, there exists k £ N and I = IqX ...xl^-i £ X fc , 
such that z := P(Zi £ Ii,i = 0 ,..., k — 1) ^ P{Wi £ I % , i = 0 ,..., k — l)=: w. 
Without loss of generality, assume that z > w. Notice that since x induces 
Y, 


lim 

71—)■ OO 


I Sfe(x) n [ 0 , n - l]j 
n 


P(Yi £ h,i = 0,..., k -l)=pz + (l - p)w. 


For m £ N, define 


Y 0 : = 


3 S N 0 : 


|£fc( x ) n \jj_ +m 
m 


1]| > (l+p)z + (l-p)w \ 


Intuitively, Vo is the set of the starting points of the segments of length 
m in x. for which the local density of the points in Si(x) is higher than or 
equal to V+pV+I 1 p) w , which is a level between z and pz + (1 — p)w. It is 
clear by the construction of A$ that Vo has a density. 

Consider process Z. Similar to x, we now have a random set 

S\{ Z) = {n > 0 : Z n+i £ I t ,i = 0,..., k - 1}. 
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Then 


2 =P{Zi G Ii,i = 0 ,k - 1) 
= - E '( 1 S?(z)(0)) 


m— 1 


- X 1 s>( z)U) 

v j=° 

m—1 

m S 1 ^(Z) 0 ') 

3=0 

(j m—1 

m S ls J( z )^ - 

j=o 


m— 1 


X] 1, s , fc(z) (^) - 


(1 + p)z + (1 -p)u; 


m 

3=0 


(1 + p)z + (1 - p)w 


m— 1 


+P - X 1 S>(Z)(j) 


m “—' * v 

i=o 


m —1 


- X 1 ^(z)0') < 


(1 +p)z + (1 -p)w 


m *—' fc v 

3=0 


m—1 

m X < 

3=0 

( 1 m_1 

- P - X 1 ^(z)(j) > 

\ 3=0 


(1 + p)z + (1 — p)w 


(1 +p)z + (1 - p)w 


+ 


(1 + p)z + (1 —p)w 


Hence, we have 


P 


( \ S K Z ) n [0, m - 1]| > (1 + p)z + (1 - p)w 
\ m ~ 2 


m— 1 


> 


- p I i E HmU) > 

\ ' i=o 

(1 — p)(z — in) 


Since Y is a mixture of Z and W, we obtain 
P 


(\ S K Y ) n [0,m — 1]| > (1 +p)z + (1 -p)w \ > p(l-p)(z-w) 


V 


m 


Then this implies that the density of the set Vo is greater or equal to 
p(i p)(z w) ^ g ' nce y is generated by x. Denote the elements of Vo in an 
increasing order as Vo = {vo,v \,...}, and define a subset V\ of Vo: 

Vi = {vi m ,i € N}. 
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That is, we only take each m —th element in Vo to form Vi. Then Vi has a 
density which is larger than or equal to —— ! -r ~-——. Moreover, the construc¬ 
tion of V\ guarantees that the intervals [j,j + m — 1\, j £ V\ are disjoint. We 
further take a subset of V \, denoted as V 2 , which has a density exactly equal 
to P ^ 1 ~ 2 rn'~ W ' > • Finally, define 

H = |J \j,j + m- 1], 

jev 2 

then H consists of disjoint intervals of integers, each with length (number of 
integers) m, and the set H has density —— ! ~r -——. 

Recall that Vo, Vj, V 2 and H all depend on m, so we can also denote them 
respectively as Vo (m), V\ (m), V 2 (m) and H(m). Notice, however, that the 
density of H(m ) does not depend on m. Now we construct an asymptotically 
proportional contraction G of the index set No in the following inductive way: 

(1) Define set G(l) = H( 1). Since G(l) has a density given by d : = 
p(i-p)G -™) , f or an y £l > 0 , there exists iV(l) £ N, such that N(l) £ 
G(l), and 

|G(1) n [0, n] j _ < ei 

n + 1 — 3 

for any n > N( 1). Moreover, since H( 2) also has a density given by 
d, we can take JV(1) large enough so that 

\H(2) n [0, n\ | d < ei 
n + 1 — 3 

for any n > N( 1). 

(2) Let {ei} be a sequence of positive numbers decreasing to 0. Assume 
that we already have a set G{m) and a positive integer N(m), where 
G{m) consists of intervals of integers with lengths increasing to m, 
and has a density given by d: N{m ) is the endpoint of an interval 
with length m in G(m): N{m) — i £ G(m),i = 0— 1, and 
satisfies 

| G(m) fl [0,77.] | _ < era 

n + 1 — 3 

and 

I H(m + 1) fl [0, n] | _ d < em 
n + 1 — 3 

for n > N{m). Then define 

G{m + 1) = ( G(m ) n [0, N(m)]) U U [*, i + m]. 

ieV 2 (m+l), 
i>N(m )+1 

That is, G{m. + 1) is obtained by joining the part of G{m) before 
N[m ) and the part of H{m+ 1) after N(m .), but the area around the 
joint point is modified so that only the whole intervals in H(m + 1) 
are kept. Notice that such a defined quantity G{m + 1) consists 
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of intervals of integers with lengths increasing to m + 1. Since both 
H(m+1) and H(m + 2) has a density given by d, there exists N(m + 
1) > N(m), such that N(m + 1) — i £ G(m + l),i = 0, m, 

1 G(m + 1) n [M| _ < e m+ i 

n + 1 — 3 

and 

|-ff (m + 2) n [0, n] | 

n + 1 

for n > IV (m + 1). 

(3) Define G by 

G = lim G{m) 

m—>• oo 
oo 

= G(m) D [N(m — 1) + 1 ,N(m)] 

m= 1 

where iV(0) = —1. 



The set G that we constructed consists of intervals of integers with lengths 
going to infinity. It is not difficult to see that we can make G to have a density 
given by d. Indeed, for rn £ N and any n £ [N(m — 1) + 1, N(m)}, 


|Gn [0, n]| _ d 
n + 1 

I G(m) n [Ml _ d 

n + 1 

I^M n [o, n \| _ d 
n + 1 

G(m — 1) fl [0, N(m — 1)] H(m) D [0, N(m — 1)] 
N(m — 1) + 1 N(m — 1) + 1 


+ 0{m/n ) 


| H(m) fl [0, n\\ 


— d + 


G(m — 1) fl [0, N(m — 1)] 


' n + 1 I 

H(m) fl [0, N(m — 1)] 
N(m — 1) + 1 

.^m— 1 , ^m— 1 . 1 . ^ 


IV(m — 1) + 1 
— d + 0{m/n ) 


Til I . 7TI 1 1 . / / \ 

“ 3 3 3 V ' ’ 


=e m -i + 0(m/n). 


The error term 0(m/n ) comes from the possible difference between H(m ) 
and G(m ) over [IV(m — 1) + l,IV(m)] due to the modification made around 
the joint point, and can be made arbitrarily small by taking N(m — 1) to be 
large enough. 

As a result, G is an asymptotically proportional contraction of the index 
set Nq. Moreover, by construction, it is clear that the lower density of 5’j[ (x) 
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in G, defined as 


lim inf 

n—>• oo 


\sl(x) n G n [o,n - 1 ]| 
|Gn[0,n —1]| 


is greater or equal to H+Pb+l 1 p) w _ Similar as before, let x / be the asymptot¬ 
ically proportional contraction of x determined by G. Then Sli^d) will have 
the same limiting behavior as 5'J(x) restricted in G. Hence, either Sl(x') 
has a density greater or equal to ; or ^ does not have a density, 

while 5jF(x) has a density given by pz + (1 — p)w. Thus, we have found an 
asymptotically proportional contraction of x which does not induce the same 
process as the original sequence x. □ 


4. Results of Stationarity Tests Applied to Paths in A 

The previous section shows that the set of functions A is large enough, 
such that any stationary process must put mass 1 on A. In this section, 
our goal is to show that the set A is also small enough, in the sense that it 
only contains the “essentially stationary” paths. To this end, we consider the 
stationarity tests applied to the paths in A, and prove that the results are 
the best that we can expect. 

Let T be a hypothesis test for sample size n and consider the null hypoth¬ 
esis Hq: X = {Xq, ..., X n _i} is stationary, or more precisely, Hq: X is from a 
stationary time series defined on R N ° or R z . In other words, T is a mapping 
from R n to {0,1}, where 0 and 1 correspond to “acceptance” and “rejection” 
of the null hypothesis, respectively. Alternatively, T can be represented as 
lc T (.Xo, ...,x n _i), where Ct £ C]Rn is the critical region (or, equivalently, the 
rejection region) of the test, being the cylindrical a —field in R". Define 

a T {P) = P(T(X) = 1) = P(C T ) 

for P £ Vo, the collection of stationary probability measures restricted to 
R n , then the size of the test T is 

a = sup oit{P)■ 

P&Vo 

We further define g n = g n p to be the projection: g n (x) = (x'o, ...,x n _i),x € 
R n °, and g n> i := g n o T\ Thus, g rh i is the operation of taking the moving 
window of size n starting from x t . 

Theorem 4.1. Let x £ A. Assume that T is a given test for stationarity of 
size a and with a given sample size n. If one of the two following conditions 
is satisfied: 

(1) the critical region Ct is closed; or 

(2) the boundary of the critical region: bd{Cx) is a null set under any 
P€P 0 , 

then the upper density of the index set 

{i £ N 0 : 5n,i(x) £ C T } 



16 


STATIONARITY TESTS FOR TIME SERIES 


is smaller than or equal to a. 

Theorem 14.11 shows that if we apply a “well-behaved” stationarity test, in 
the sense that it satisfies one of the two conditions listed in the theorem, 
to a moving window with length n of any path x in the set A, then the 
limiting frequency that the null hypothesis of stationarity is rejected should 
not exceed the size of the test. Intuitively, this ensures that when we apply a 
stationarity test to a path in A, we get the best possible result that we come 
to expect. More precisely, notice that the size a can be approached by the 
rejection rate of the null hypothesis even if it is true. Then by the ergodic 
decomposition, for arbitrarily small e > 0, there exists an ergodic process, 
for which the rejection rate is larger than a — e. Interpreting ergodicity as 
the equivalence between the mean across time and the mean across space, for 
a typical path of this ergodic process, the null hypothesis should be rejected 
with a limiting frequency greater than a — e when the window of length 
n moves from the origin to +oo. Therefore having a limiting frequency of 
rejection smaller or equal to a is the best that we should expect to get. 
Any further requirement will exclude typical paths from certain stationary 
processes. 

The significance of Theorem 14.11 resides in the conclusion that if a path x 
is known to belong to set A, then it is “statistically indistinguishable” with 
a typical path from a stationary process, in the sense that its performance 
under any stationarity test satisfying the condition of Theorem 14.11 will be 
at least as good as the path from the stationary process. In other words, we 
should not expect to find any statistical method to be able to discriminate 
between x and a typical path from some stationary process. 

Proof of Theorem \4 . 1\ Let x £ A and Y x be the ergodic process that x 
induces. Define 

Jn = {J € Cm, : lim SlL. = P ( 5n ( Y x ) e J)}, 

m—> oo 777, 

where P is the stationary measure induced by x. 

By the definition of set A, J n includes all of the n-dimensional cylinder 
sets (i.e., open hypercubes). In other words, T n C J n . Moreover, J n clearly 
satisfies the following properties: 

(1) <f> £ Jn,R n £ J n \ 

(2) Ji, J 2 <E Jn, J\ D J 2 implies Ji \ J 2 <E J n \ 

(3) Ji, Jo £ J n , J\ Cl Jo = <t> implies Ji U J 2 G Jn- 

This is to say that J n is closed under true difference and finite disjoint 
union. The following proposition is a simple consequence of the fact that the 
Euclidean space M n with its usual topology is complete separable. 

Proposition 4.2. Let C be a C^n-measurable set, P be a probability measure 
on (M n , Cr™ ). Then for any e > 0, there exists J £ J n , J C C, such that 
P(J)>P(C)-e. 
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Proof. The proof of this proposition is fundamental. Here we only provide 
a sketch of the proof. Consider a collection of all hypercubes whose faces 
are parallel to the axes and whose vertices have rational coordinates. This 
is a countable topological basis of M n with its usual topology. Thus, for any 
C, its interior C, as an open set, can be expressed as the (countable) union 
of some members of this topological basis, denoted as B\,B 2 ,.... For any 
e > 0, there exists a finite number k(e), such that P^U^^Bi ) > P(C) - e. 

k(e) 

Repartitioning Ud-[B, into finite disjoint hypercubes completes the proof. 

□ 

The proof of Theorem 14.11 becomes simple. Let T be a given test of size a 
and with a sample size n, and let P be the stationary measure induced by 
x. Hence P(Ct ) < a. If T satisfies one of the two conditions listed in the 
theorem, then P{(Cf)°) = P(Cf) > 1 — a, where (Cf)° is the interior of 
Cf. For e > 0, by Proposition 14.21 there exists J £ J n , J C Cf„ such that 

P(9n( Y x ) € J) > P(C C T ) - e > 1 - a - e. 

Since J € J n , the set {i £ No : <7n.*( x ) € J} has a density which is greater 
or equal to 1 — a — e. This implies that {i £ No : £ Cf} has a lower 

density which is greater than or equal to 1 — a — e. Since e can be taken 
arbitrarily small, the lower density of {i £ No : g n ^(x) £ Cf} is at least 
1 — a. In other words, the upper density of {i £ No : f/ n ,,(x) £ Ct} is smaller 
than or equal to a. □ 

In practice, most of the stationarity tests introduce additional assump¬ 
tions on the stochastic processes (time series) in their null hypotheses or 
alternative hypotheses in constructing the tests or in analyzing their powers. 
A close examination of the proof of Theorem 14.11 reveals that such additional 
assumptions should not affect the result of the theorem. That is, if we can 
check that the process Y x satisfies the additional assumptions of a test, then 
applying the test to a moving window of the path x £ A will still lead to a 
limiting frequency of rejection no larger than the size of the test. Intuitively, 
the fact that the path x is in A still guarantees the stationarity; if the test 
results in a higher frequency of rejection, this is due to the violation of the 
additional assumptions rather than evidence of non-stationarity. 

On the other hand, the two conditions in Theorem 14.11 are very general. 
As a matter of fact, a good test should have 6d(Cr) to be a null set under 
the null hypothesis after all, and this is almost always the case in practice. 
Consequently, many prior studies do not even specify the openess/closedness 
of the critical region. It is not difficult to check that all of the stationarity 
tests mentioned in Introduction satisfy the conditions of Theorem 14. II Thus, 
following our discussion on the additional assumptions, the result of Theo¬ 
rem [4J] applies to all of these tests. In some sense, what we have shown is 
that all of the existing time series tests for stationarity reduce to checking 
whether or not the given path is in the set A. 
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The above results are for tests with a fixed sample size. Next we discuss 
two types of asymptotic behaviors of paths in A. 

The first kind of asymptotic behavior does not require any additional 
assumption or technical result. Many stationarity tests used in practice do 
not have a known exact size, but only an asymptotic size. In other words, 
there are sequences of tests with sample sizes n increasing to infinity, such 
that although the size for any test with a fixed sample size is unknown, 
there exists a limiting size as n — >■ oo. In this case, Theorem 14.11 immediately 
allows us to claim the following result. 

Corollary 4.3. Let x £ A. Assume that {T„} ne ^ is a sequence of tests for 
stationarity, where T n is for sample size n and has size a n . If linin^oo a n = 
a, and for each n £ N, one of the two conditions in Theorem \f. 1\ is satisfied 
by the critical region CT n of T n , then for any e > 0, there exists N e £ N, 
such that the upper density of the index set 

{z £ N 0 : 5n,i(x) <E C Tn } 
is smaller than a + e for any n > N e . 

The second kind of asymptotic result is more challenging. For a fixed path 
x, we apply stationarity tests to a longer and longer fraction of the path, 
always starting from the first term xq, and look at the limiting behavior of the 
results of these tests. Such limiting results are typically strong and require 
more assumptions on the tests, as well as some more powerful technical 
advances. To obtain the results, it is helpful to consider the cylindrical er- 
field C over the whole path space M z , and define 

J = {J £ C : lim E *=° lj (° ^ = P(Y X £ J)}, 

m —>oo rn 

where 6 is the shift operator, so that C and J do not correspond to any fixed 
n. We can improve Proposition 14.21 to the following result. 


Proposition 4.4. Let C be a C-measurable set, and let P be a probability 
measure on (M Z ,C). Then for any e > 0, there exists J £ J, J C C, such 
that P(J ) > P(C) - e. 


Proof. For each n £ N, let J n be defined as in the proof of Theorem 14.11 
Denote by Q n the collection of the sets C £ C®™ satisfying for any e > 0, 
there exists J £ J n , J C C, such that P{J) > P(C) — e. Clearly, J n C Q n . 
In particular, all of the n-dimensional open hypercubes are in Q n . Indeed, it 
is not difficult to verify that all of the n-dimensional hypercubes, regardless 
of the openess/closedness of the boundaries, are all in J n C Q n . Moreover, 
Q n is closed under finite disjoint unions. To see this, let Ci,..., C m be disjoint 
sets in Q n . Let J \,..., J m be the sets satisfying Proposition 14.21 for C\, ..., C m 
and ei = 2~ l e , i = 1, then J = Uz=i 'h i s i n 3n, J C C and satisfies 

P(J) > P(C) — e. Denote by J-^the field gen erated by the n-dimensional 
hypercubes. Then a result in Billingsley 1 1 1995 .1 shows that each member in 








STATIONARITY TESTS FOR TIME SERIES 


19 


P n can be expressed as a finite union of disjoint hypercubes. As a result, 
Pa c Q n . 

Next we prove that P n C J n . Note that M n G P n (~l J n , and C G P n n J n 
implies C c G P n n J n . Furthermore, P n fl J n is closed under union. Indeed, 
let Ci, C 2 G P n fl J n . Then Ci U C 2 and (Ci U C 2 ) c are both in P n C Q n . 
Consequently, for each e > 0, there exist Ji )£ , J 2 i£ G J n , Ji,e ^ C 1 UC 2 , J 2 ,e Q 
(CiUC 2 ) c , such that P(Jj )£ ) > P(CiUC 2 ) —e and P(J 2 ,e) > P((CiUC 2 ) c ) —e. 
Therefore, we have 


P(C 1 U C 2 ) - e 

<P(Jl,e) 

= lim YT=o 1./i.«0fa,»(x)) 

m—^ 00 777, 

< lim inf ^Xo 1 lc 1 uC 2 (gn,i(x)) _ 

— m—>• 00 777, 

Letting e to 0 leads to the following result: 

lim inf X=o lc 1 uc a (gn,i(x)) > u 

m—>■ oo 777, 


Symmetrically, using J 2 ?e we have 


nm inf EIV 1 (CiUC 2 ) c (gn,»( x )) 
m—>• oo 777, 


> p(Ci u c 2 ) c . 


Thus, lirnm^oo 1Cl ^ t C2 exists and is equal to P(Ci UC 2 ). Hence 
Ci U C 2 G T n D J n . P n fl J n is a field. Since P n is the field generated by 
the n-dimensional hypercubes, and all of the hypercubes are both in P n and 
J n , we must have p n F J n . 

Finally, let P = U neN .Pn be the field on generated by all cylinder 
sets. Notice that since any member in P only have a finite number of finite¬ 
dimensional constraints, T C U neN J n C J. Denote by C the collection 
of sets C in C satisfying for each e > 0, there exists J G P, J C C, such 
that P(J) > P(C) — e. By definition, it is easy to see that C' contains 
(j) and Moreover, let Ci,C 2 ,... G C , then for any e > 0, there exists 
IV G N and J U ,...,J N G P, such that P(U=iv+l C i \ U=i C i) < f and 
P(Jj) > P(Ci) — 2~ l ~ 1 e for i = 1, The set J = UX Ji is i* 1 P and 

satisfies P(J) > P(UieN^'i) — e - Hence U igN Cj G C Similarly, it is easy to 
see that C is closed under finite intersections. As a result, C is a topology. 
Therefore it contains the topology generated by P, which is the natural 
topology on Thus, we can conclude that for any C-measurable set C, for 
the open set C, there exists a set J G P C J , such that P(J) > P(C) —e. □ 


Proposition 14.21 and its consequence, Theorem 14.11 show that for any time 
series stationarity test with a fixed sample size satisfying some mild condi¬ 
tions, a path in set A will behave as well as a typical path from a stationary 
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process. Proposition 14.41 allows us to generalize this statement to any as¬ 
ymptotic property. For instance, let {T n } n gN be a sequence of stationarity 
tests with sample sizes n and satisfying the condition in Theorem 14.11 At the 
risk of abusing notations, we also use T n for the corresponding test statistics. 
Then for x € A, the limiting behavior of T n (x) as n —> oo will be comparable 
to that of Y x , which is a stationary process. 


Example 4.5. If for any stationary time series X, the limiting rejection 
rate of T n 


lim 


E» =1 r t ( gt (x)) 


n—¥ oo Tl 

almost surely exists and is bounded from above by a constant a, then Propo¬ 
sitionimplies that for any x € A and generic m £ N, 

n —>oo n 


exists and is bounded from above by a. “Generic” means, the set of m for 
which the result does not hold has a limiting density 0 in N. If the assumption 
is relaxed to the existence of the upper/lower limit of the rejection rate and 
their bounds, the corresponding results holds as well for the paths in A. 
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